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ABSTRACT 

The non-Gaussian nature of the epoch of reionization (EoR) 21-cm signal has a significant 
impact on the error variance of its power spectrum P(k). We have used a large ensemble 
of semi-numerical simulations and an analytical model to estimate the effect of this non- 
Gaussianity on the entire error-covariance matrix . Our analytical model shows that C rj has 
contributions from two sources. One is the usual variance for a Gaussian random field which 
scales inversely of the number of modes that goes into the estimation of P(k). The other is the 
trispectrum of the signal. Using the simulated 21-cm signal ensemble, an ensemble of the ran¬ 
domized signal and ensembles of Gaussian random ensembles we have quantified the effect 
of the trispectrum on the error variance Cu. We find that its relative contribution is comparable 
to or larger than that of the Gaussian term for the k range 0.3 < k < 1.0 Mpc , and can 
be even ~ 200 times larger at k ~ 5 Mpc -1 . We also establish that the off-diagonal terms of 
Cij have statistically significant non-zero values which arise purely from the trispectrum. This 
further signifies that the error in different k modes are not independent. We find a strong cor¬ 
relation between the errors at large k values (> 0.5 Mpc -1 ), and a weak correlation between 
the smallest and largest k values. There is also a small anti-correlation between the errors in 
the smallest and intermediate k values. These results are relevant for the k range that will be 
probed by the current and upcoming EoR 21-cm experiments. 

Key words: methods: statistical - cosmology: theory - dark ages, reionization, first stars - 
diffuse radiation. 


1 INTRODUCTION 

The epoch of reionization (EoR) is one of the least known but im¬ 
portant periods in the history of our Universe. During this epoch the 
diffused hydrogen in our universe gradually changed from neutral 
to ionized. Our current knowledge about this epoch is very lim¬ 
ited. The measurements of the Thomson scattering optical depth of 
the cosmic microwave background (CMB) photons from the free 
electrons in the intergalactic media (IGM) (e.g. see Komatsu et al. 
2011; Planck Collaboration et al. 2014; Planck Collaboration 2015 
etc.) and the observations of the Lyman-a absorption spectra of the 
high redshift quasars (e.g. seeBecker et al. 2001; Fan et al. 2003; 
White et al. 2003; Goto et al. 2011; Becker et al. 2015 etc.) sug¬ 
gest that this epoch was probably extended over a wide redshift 
range 6 < 3 < 12 (see e.g. Mitra, Choudhury & Ferrara 2011, 
2015; Mitra, Ferrara & Choudhury 2013; Robertson et al. 2013, 
2015). However, many fundamental issues such as the characteris¬ 
tics of the major ionizing sources, the precise duration and timing 
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of reionization and the topology of the neutral hydrogen (Hi) dis¬ 
tribution etc. cannot be resolved using these indirect observations. 

Observation of the redshifted Hi 21-cm signal which provides 
a direct window to the state of the hydrogen in the IGM is a very 
promising probe of the EoR. There is a considerable effort under¬ 
way to detect the EoR 21-cm signal through radio interferometry 
using e.g. the GMRT 1 (Paciga et al., 2013). LOFAR 2 (van Haarlem 
et al., 2013; Yatawatta et al., 2013), MWA 3 (Bowman et al., 2013; 
Tingay et al., 2013; Dillon et al., 2014) and PAPER 4 (Parsons et al., 
2014; Ali et al., 2015; Jacobs et al., 2015). Apart from these first- 
generation radio interferometers, the detection of this signal is also 
one of the key science goals of the future telescopes such as the 
SKA 5 (Mellema et al., 2013; Koopmans et al., 2015) and HERA 6 
(Furlanetto et al., 2009). The Hi 21-cm signal is expected to be very 

1 http://www.gmrt.ncra.tifr.res.in 

2 http://www.lofar.org/ 

3 http://www.haystack.mit.edu/ast/arrays/mwa/ 

4 http://eor.berkeley.edu/ 

5 http://www.skatelescope.org/ 

6 http://reionization.org/ 
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weak (~ 4 — 5 orders in magnitude) compared to the enormous 
amount of foreground emission, from our own galaxy and the ex- 
tragalactic sources, in which it is buried (Di Matteo et al., 2002; 
Gleser, Nusser & Benson, 2008; Ali, Bharadwaj & Chengalur, 
2008; Jelic et al., 2008; Bernardi et al., 2009; Ghosh et al., 2012; 
Pober et al., 2013; Moore et al., 2013, 2015). Mainly these fore¬ 
grounds, the system noise (Morales, 2005; McQuinn et al., 2006) 
and the other sources of calibration errors together have kept the 
cosmologists at bay from detecting this signal and till today only 
a rather weak upper limit on it have been obtained (Paciga et al., 
2013; Dillon et al., 2014; Parsons et al., 2014; Ali et al., 2015). 
Due to these obstacles, it is anticipated that the first detection of 
the signal will be through statistical estimators such as the variance 
(Patil et al., 2014) and the power spectrum (Pober et al., 2014), 
where one adds up the signal optimally to enhance the signal-to- 
noise ratio (SNR). 

Any statistical estimation of a signal comes with an intrinsic 
uncertainty of its own, which arises because of the uncertainties in 
the signal across its different statistically independent realizations. 
In cosmology, this uncertainty is more commonly known as the 
cosmic variance (in other words this is the uncertainty due to the 
fact that we have only one universe to estimate the signal). Apart 
from the cosmic variance there will be uncertainties due to the sen¬ 
sitivity of the instrument as well (e.g. system noise, non-uniform 
baseline distribution etc.). It is necessary to quantify the different 
possible uncertainties in these measurements to correctly interpret 
the signal once it has been detected. If the EoR 21-cm signal had 
the nature and properties similar to a Gaussian random field, the 
estimation of its cosmic variance would have been very straight 
forward, as it scales as the square root of the number of indepen¬ 
dent measurements. Almost all studies (e.g. Morales 2005; Mc¬ 
Quinn et al. 2006; Beardsley et al. 2013; Jensen et al. 2013; Pober 
et al. 2014; Koopmans et al. 2015 etc.) that have been undertaken 
to quantify the detectability of the EoR 21-cm power spectrum us¬ 
ing different instruments such as the MWA, LOFAR, PAPER, SKA 
etc. assume the signal to be a Gaussian random field while estimat¬ 
ing its cosmic variance. This can be a reasonably good assumption 
at large length-scales during the early phases of reionization when 
the Hi is expected to trace the underlying dark matter distribution. 
However, during the intermediate and the later stages of the reion¬ 
ization, the signal only appears from the neutral hydrogen located 
on the periphery of the ionized (Hll) regions which are gradually 
growing both in number and size. This makes the redshifted 21-cm 
signal from the later stages of EoR highly non-Gaussian. 

The statistics of a Gaussian random field is completely speci¬ 
fied by its power spectrum, whereas the higher order statistics like 
the bispectrum (Bharadwaj & Ali, 2005) and the trispectrum are 
also important for a highly non-Gaussian field like the EoR 21-cm 
signal. Though the power spectrum itself cannot capture the non- 
Gaussian nature of the signal, the non-Gaussianity however will 
significantly affect its error estimates (i.e. cosmic variance). This 
has been demonstrated in a recent work by Mondal et al. (2015) us¬ 
ing a large ensemble of simulated EoR 21-cm signal. Mondal et al. 
(2015) have shown that for a fixed observation volume, it is not pos¬ 
sible to obtain an SNR above a certain limiting value, even when 
one increases the number of Fourier modes that goes into the esti¬ 
mation of the power spectrum. The analytical model for the cosmic 
variance proposed in this work further indicates that this limiting 
value of the SNR is directly related to the trispectrum of the signal 
and the total survey volume under consideration. 

In this follow-up work on Mondal et al. (2015), we extend 
their analytical model to derive a generic expression for the entire 


error covariance matrix of the binned 21-cm power spectrum. Us¬ 
ing a large number of realizations of the simulated 21-cm signal 
from EoR we further attempt to quantify the error covariance of its 
power spectrum. We also interpret it in the light of this improved 
analytical model. Since, this study is limited by the finite number 
of realizations of the signal, thus we further check the statistical 
significance of this error covariance. Besides this, the entire anal¬ 
ysis of this paper is based on the numerical simulations of 21-cm 
signal which have a finite comoving volume. We therefore test the 
convergence of our results by increasing our simulation volume. 
Finally, we have tried to extract the trispectrum of the signal from 
the non-Gaussian component of the error covariance of the power 
spectrum. It is also important to note that the nature of the results 
and the analytical model that we have presented here is not limited 
only to the EoR 21-cm signal but can be applied to the analysis 
of any non-Gaussian cosmological signal such as the galaxy red- 
shift surveys (Feldman, Kaiser & Peacock, 1994; Neyrinck, 2011; 
Mohammed & Seljak, 2014; Carron, Wolk & Szapudi, 2015). 

The structure of this paper is as follows. Starting from the ba¬ 
sic definition of the 21-cm brightness temperature fluctuations we 
derive the expressions for the power spectrum and the trispectrum 
of the EoR redshifted 21-cm signal in Section 2. We next derive the 
error covariance of the binned power spectrum estimator and also 
show its relation to the trispectrum in Section 3. Section 4 describes 
the semi-numerical simulations that we have used to generate the 
realizations of the EoR 21-cm signal. In Section 5, we describe 
about the reference ensembles which are used to interpret the re¬ 
sults. In Section 6, we describe our results i.e. the estimated error 
covariance of the power spectrum from the simulated data. Finally, 
in Section 7, we discuss and summarize our results. 

Throughout this paper, we have used the Planck+ WP best¬ 
fitting values of cosmological parameters D m o = 0.3183, Dao = 
0.6817, Q. h0 h 2 = 0.022032, h = 0.6704, a 8 = 0.8347 and 
n s = 0.9619 (Planck Collaboration et al., 2014). 

2 THE POWER SPECTRUM AND THE TRISPECTRUM 

The EoR 21-cm signal is quantified through the brightness temper¬ 
ature fluctuation 

<5T b (*)=T b (jc)-T b . (1) 

In this paper we are interested in the statistical properties of 5Tb(x) 
which is assumed to be a statistically homogeneous random field. 
The two point statistics of 5Tb (x) is quantified through the two- 
point correlation function £ which is defined as 

(5T h {xi) 5T h {x 2 )) = £(xi,x 2 ) (2) 

where the angular brackets (...) denote an ensemble average over 
many statistically independent realizations of 5Tb (x). It follows 
from statistical homogeneity that the two-point correlation func¬ 
tion is invariant if we apply a displacement a to both xi and x 2 , or 
equivalently £ depends only onx 2 i = x 2 — xi the relative displace¬ 
ment vector between the two points xi and x 2 

€(x 1 ,x 2 ) = £(*i +a,x 2 +a) = £(x 2 i) ■ (3) 

The EoR 21-cm signal is not statistically isotropic due to 
redshift space distortion (Bharadwaj & Ali, 2004). While sev¬ 
eral works have attempted to quantify this anisotropy (Majumdar, 
Bharadwaj & Choudhury, 2013; Jensen et al., 2013; Shapiro et al., 
2013; Majumdar et al., 2014; Ghara, Choudhury & Datta, 2015; 
Fialkov, Barkana & Cohen, 2015; Majumdar et al., 2015), in this 
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work we only consider ^(* 21 ), which is the monopole (isotropic) 
component of £(* 21 ). 

We now consider the four point statistics (see e.g. equation 
35.3 of Peebles 1980) 

(<5Tb(^i) 5Tb{x 2 ) 5Tb(x 3 ) 5Tb(x4)) = f (*i2)C(*34) 

+ C( a; 13)f(*24) + C(*14)C( a; 23) + Tj{x 1,X2,X3 ,Xa) ( 4 ) 

where the (reduced) four-point correlation function rj quantifies the 
excess over the product of £s. Here, statistical homogeneity implies 
that r/ is invariant if we apply a displacement a to xi , X 2 , *3 and X 4 
i.e. 

r/(xi,X2,X3,X4) = rf(xi +a,x 2 +a,x 3 +a,x 4 +a ) (5) 

or equivalently r/ depends only on three relative displacement vec¬ 
tors 

r/(xi,X2,X3,X4) = r/(X21,X31,X4l) ■ ( 6 ) 

It is convenient to use the Fourier representation considering a 
cubic comoving volume V with periodic boundary conditions. We 
then have 

T b (x) = yJ2 etkXfb ( k ) (7) 

k 

where Th(k) is the Fourier transform of Tb(x). Note that the wave 
vector k assumes both positive and negative values, however these 
are not independent as we have the relation Tb{k) = Tb(—k) 
which holds for the Fourier transform of a real quantity. Further, 
we can equally well interpret Tb{k) as the Fourier transform of 
5Tb(x) for all values of k barring k — 0. 

We first consider the two-point statistics. Incorporating the 
Fourier representation equation (7) in equation (2), we have 

£( xi , x 2 ) = e i(klX ' +k2 - X2) (Tb^rJTb^a)) ( 8 ) 

*1,^2 

We see that the r.h.s. picks up an extra phase factor Q = e *(*i+* 2 )-« 
if we apply a displacement a to both xi and X 2 . However, the as¬ 
sumption of statistical homogeneity (equation 3) requires equation 
(8) to be invariant under such a displacement. This implies that 
(Tb(Jfci)Tb(* 2 )> has non-zero values only when fci + £2 = 0 for 
which Q = 1, and it is zero when0, We than have 

{Tb (k \) Tb {1^2 )) = 5k 1 +i l2 ,oVP(ki) (9) 

where the Konecker delta Sk 1 +k 2 ,o is 1 if k\ + k 2 = 0 and 0 other¬ 
wise. Here the power spectrum P{k) = P(k) is defined as 

P(k) = V~ 1 {Tb(k) Tb(—k)} . (10) 

Using equation (9) in equation (8), we have 

«xi,x a ) = e i(*i*i+W x v '5k 1 +k 2 ,oP{k 1 ) (11) 

*1>*2 

whereby we see that the power spectrum is the Fourier transform 
of the two-point correlation function 

£(*2! ) = (12) 

k 

Proceeding in exactly the same manner for the four-point 
statistics (equation 4), statistical homogeneity (equation 5) requires 
that 

(T b (o)f b (&)f b (c)T b (d)> = V 2 [5 a+b , 0 5 c+d , 0 P{a)P{c) 

+ 5 a +c,o5b+d,oP(a)P{b) + 5 a +d,o5b+c,oP{a)P(b)] 

+ V5 a+b+c+d}0 T(a,b,c,d) (13) 


where we have used the notation Tb (a) = T b (k a ). Here the trispec¬ 
trum T(k a ,k b ,k c ,kd) is the Fourier transform of the four-point 
correlation function 

^(xr.xa.xs.x*) = ^ £ ^,+k^+k^+k^) 

kl 

x V5k 1 +k 2 +k 3 +k i ,oT(ki,k2,k3,k4). (14) 

Note that equation (14) for the four point statistics is exactly analo¬ 
gous to equation (11) which has been discussed earlier for the two- 
point statistics. We can also carry out the sum over k\ and express 
equation (14) as 

»7(X21,X31,X41) = ^ Y, e i( * 2 '* 21+ * 3 '* 31+ * 4 ' l;4l) 

k 2 3 I k 4 

x T(—k 2 — k 3 — k4,k2, ^ 3 ,^ 4 ) • (15) 

The entire analysis of this paper is based on numerical simula¬ 
tions which have a finite comoving volume V. The various factors 
of V that appear in equations (9), and (12) - (14) leave one won¬ 
dering whether the power spectrum and particularly the trispectrum 
would vary if the volume V were changed. To address this, we con¬ 
sider the limit V —¥ 00 . In this limit the power spectrum 

[-P(*)]<x> = J £{x 2 i)e~' kX21 d 3 x 2 i (16) 

and the trispectrum 

[T(-k 2 -k 3 -k 4 ,k 2 ,k 3 ,k 4 )] ao = J rj(x 2 i,x 3 i,* 4 i)x 

e -i(k 2 .x 21+ k 3 -x 31 +k 4 -x 4 i) d 3 X2i d 3 X3i d 3 X4i (17) 

have finite, well defined values provided the integrals 

/ | ^(x 2 i) | d 3 x 21 (18) 

and 

| »7(X21,X31,X4l) | d 3 X2i d 3 x 3 i d 3 X41 (19) 

respectively converge. 

We have assumed that ^(x 2 i) and r/(x 2 i,x 3 i,X 4 i) fall suffi¬ 
ciently rapidly at large separations so that the integrals in equations 
(18) and (19) both converge. The limiting power spectrum [P]oc 
and trispectrum [TJoo then have finite, well defined values, and the 
simulated P and T would respectively converge to [P]oo and [T]oo 
if the simulation volume V were increased. In our analysis we as¬ 
sume that our simulations cover a sufficiently large volume of the 
universe whereby the simulated power spectrum and trispectrum 
are respectively sufficiently close to [P] ^ and [T] 00 for the k range 
of our interest, and the simulated values would not change signifi¬ 
cantly if the volume V were increased further. 


3 THE ERROR-COVARIANCE OF THE POWER 
SPECTRUM 

The question here is ‘How accurately can we estimate the power 
spectrum from a given EoR data?’. In general, any observation will 
yield a combination of the EoR signal and instrumental noise, as¬ 
suming that the foregrounds have been completely subtracted out. 
In this analysis, we only consider the statistical errors which are 
inherent to the EoR signal, and we do not consider the instrumen¬ 
tal noise. The statistical errors which we have considered here are 
usually referred to as the cosmic variance. 


4 Mondal, Bharadwaj and Majumdar 


We consider the binned power spectrum estimator P(ki) 
which, for the i th bin, is defined as 

P(ki) = f b (-k), (20) 

i k 


where N ki and ki respectively refer to the sum, the number 
and the average comoving wavenumber of all the Fourier modes 
in the i th bin. The bins here are spherical shells of width A ki in 
Fourier space. We have used logarithmic binning which essentially 
implies that A ki (oc ki) will vary from bin to bin. As the modes 
k and —k do not provide independent estimates of the power spec¬ 
trum, we have restricted the sum to the upper half of the spheri¬ 
cal shell which has volume (27r) k 2 A ki in k space. To calculate 
Nk , the number of Fourier modes in this volume, we note that the 
different wave vectors k are all equally spaced at a separation of 
(2 tt)/H 1/3 infc space. We consequently have 


N ki 


(2n)ki 2 Aki 
[(2tt)/F1/3] 3 


V 


(2tt) : 


x ki Aki 


( 21 ) 


which we use to estimate N ki . 

The ensemble average of the estimator gives the bin-averaged 
power spectrum 


(P(ki))^P(ki) = ^-J2 P ^- C 22 ) 

1 a 

The error-covariance of the power spectrum estimator 

Cij = ([P(ki) - P(ki)] [P{kj) - Pikj)]) (23) 


is the quantity of interest here. This can also be written as 

Cii = [< P(ki) P(kj))} - P(ki) P(kj) . (24) 


and the term in the square brackets [...] of equation (24) can be 
expressed as 

JL— Y, (T h (a)f h (-a)f h (b)f h (-b)). (25) 
ki Vfc J V aei,bej 


Using eq. (13) to simplify eq. (25) we can express the error covari¬ 
ance as 


where 


_P 2 (ki) T{ki,kj) 

k-'i'i » t Oij i 


N ki 


V 


p2 ^)™E p2 («) 


(26) 


(27) 


is the square of the power spectrum averaged over the i th bin, and 


T(ki, kj) 


1 

N kz N kj 


E T(a, —a, b, —b) 

a£.i,b£j 


(28) 


is the average trispectrum where k a and kb are summed over the 
i th and the j th bins respectively. 

We first discuss the results expected for a Gaussian random 
field for which the trispectrum is zero. In this case we can use equa¬ 
tion (21) to express the covariance matrix as 


1 \{2n) 2 P 2 {ki) 

ij V [ kfAki 


(29) 


The first point here is that the covariance matrix is diago¬ 
nal. This implies that the errors in the different bins are uncor¬ 
related. The second point is that the covariance matrix scales as 
Cij oc (V Aki)^ 1 if we increase the observational volume V or 
the bin width Aki. 


It is possible to interpret the diagonal elements Cu as the error 
variance Cu = [ 5P(ki )] 2 for the power spectrum. We can then 
express the error in the estimated power spectrum as 


SP(ki) = 


(2n ) 2 P 2 (ki) 
Vk 2 Aki 


(30) 


which is analogous to the error estimate in the context of galaxy 
redshift surveys (e.g. equation 11.119 of Dodelson 2003). We see 
that the error falls as SP(ki) oc 1/vT if we increase the obser¬ 
vational volume. For a fixed observational volume, we expect the 
error to fall as SP(ki) oc 1 /\J Ak, until it reaches a minimum value 
which is achieved when all the Fourier modes are combined into a 
single bin. 

The EoR signal becomes increasingly non-Gaussian as the 
reionization proceeds. This manifests itself as a non-zero trispec¬ 
trum in the error-covariance (equation 26) which can be expressed 
as 


Cij — 


1 

V 


f ( 2 tt ) 2 P 2 (ki) \ 

V *? Aki ) 


bij + T(ki, kj) 


(31) 


The covariance matrix still retains the 1/1/ dependence, sim¬ 
ilar to the Gaussian random field discussed earlier. Consequently 
we still expect the errors in the estimated power spectrum to go 
down as 1/ y/V if the observational volume is increased. However, 
the covariance matrix now has two major differences from that of a 
Gaussian random field. 

The first difference is that the covariance matrix is no longer 
diagonal. The average trispectrum T(ki, kj) quantifies the correla¬ 
tion between the EoR signal in two different bins ( i and j). The off- 
diagonal elements of the covariance matrix (Cij = T(ki, kj)/V) 
quantifies the correlation between the errors in the power spectrum 
estimated in the i and j bins respectively. 

The second difference is that the diagonal terms of the covari¬ 
ance matrix deviate from the Cu oc 1/Afc; behaviour predicted 
for a Gaussian random filed. For small bin-widths (Aki k 2 
(2n) 2 P 2 (ki) /T(ki,ki)), we expect the error variance to fall 
as Cu oc 1/Afo as the bin-width Aki is increased. The er¬ 
ror variance Cu saturates as the bin-width approaches Aki k 2 fts 
(27 r) 2 P 2 (ki) /T(ki,ki), and it does not fall below the limiting 
value [Cu]i = T(ki, ki)/V even if all the Fourier modes are com¬ 
bined into a single bin. 

For a Gaussian random field, we expect the signal to noise ra¬ 
tio SNRi = P{ki)/8P(ki) to increase as SNRi oc y/Nxi if we 
increase the number of modes N ki in the bin . The SNR, however, 
will saturate at a limiting value [SNRi]; = P(ki) / y/[Cu]i when 
the EoR 21-cm signal becomes non-Gaussian. Semi-numerical 
simulations show (Mondal et al., 2015) that the SNRi oc y/N ki 
behaviour only holds for small SNRi, and SNR; saturates at a lim¬ 
iting value [SNRi] j as N ki is increased. The limiting value [SNRi] ; 
is found to decrease (i.e. Cu increases) as reionization proceeds. 

The expected Cu oc 1/A ki behaviour is a consequence of the 
fact that the signal in the different Fourier modes T b (k) is inde¬ 
pendent for a Gaussian random field. The EoR signal at the differ¬ 
ent Fourier modes, however, become correlated as ionized bubbles 
develop and the Hi signal becomes non-Gaussian. The trispectrum 
quantifies this correlation between the signal at different Fourier 
modes. The fact that Cu saturates and does not decrease beyond 
[Cu]i even if we increase A ki is a consequence of the fact that we 
are not adding independent information by increasing the number 
of Fourier modes in the bin. 

The trispectrum T(ki,k 2 ,ky,k 4 .) is, in general (equation 13), 
sensitive to correlations in both the amplitude and the phase 
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of the signal at the different Fourier modes Tb(ki), Tb(ki), 
T h (k 3 ) and Tb(& 4 ). The average trispectrum T(ki,kj) (equa¬ 
tion 28). which appears in our expression for the error co- 
variance (equation 26), however, depends only on the term 
( Tb(k) Tb (—k) Tb{k ) Tb (—k )) which is insensitive to correla¬ 
tions in the phase of the modes Tb(k) and Tb{k ). We therefore see 
that the error covariance Cij (equation 31) is only affected by the 
correlations in the amplitude of Tb{ki) and Tb(kj), it is insensitive 
to purely phase correlations between the signal at these two modes. 

In summary of this section we note that the non-Gaussianity 
introduces an extra term T(ki, kj) /V in the error covariance (equa¬ 
tion 31). As a consequence the error variance Ca for the binned 
power spectrum saturates at a limiting value [Cu]i, it is not pos¬ 
sible to decrease the error in the estimated power spectrum be¬ 
yond y[Cii]7 by increasing the number of Fourier modes in the 
bin. Further, the error covariance matrix Cij is not diagonal. The 
off-diagonal terms quantify the correlations between the errors in 
the power spectrum estimated at different bins. 


4 SIMULATING THE EOR REDSHIFTED 21-CM 
SIGNAL 

We have used semi-numerical simulations to generate the EoR 
redshifted 21-cm signal. These simulations consist of three main 
steps. First, we use a particle mesh A'-body code to generate the 
dark matter distribution at the desired redshift. We have run sim¬ 
ulations with two different comoving volumes Vi = [150 Mpc] 3 
and Vi = [215 Mpc] 3 using grids of size 2144 3 and 3072 3 , re¬ 
spectively. The spatial resolution 0.07 Mpc and the mass resolution 
1.09 x 10 8 Mq is maintained the same for both Vi and Vi. In the 
next step we identify the mass and the location of collapsed haloes 
using the standard Friends-of-Friends (FoF) algorithm (Davis et al., 
1985) with a fixed linking length of 0.2 times the mean inter¬ 
particle distance. We have set the criterion that a halo should have 
at least 10 dark matter particles whereby we have a minimum halo 
mass of 1.09 x 10 9 Mg 

The final step generates the ionization map based on the excur¬ 
sion set formalism of Furlanetto, Zaldarriaga & Hernquist (2004). 
The basic assumption here is that the hydrogen traces the dark mat¬ 
ter distribution and the dark matter haloes host the sources which 
emit ionizing radiation. It is assumed that the number of ioniz¬ 
ing photons emitted by a source is proportional to the mass of the 
host halo, and it is possible to achieve different values of the mass 
averaged Hi neutral fractions xti i by tuning this constant of pro¬ 
portionality. Our simulation closely follows Choudhury, Haehnelt 
& Regan (2009) to generate the ionization map, and the resulting 
Hi distribution is mapped onto redshift space to generate 21-cm 
brightness temperature maps following Majumdar, Bharadwaj & 
Choudhury (2013). The grid used to generate the ionization maps 
and the 21-cm brightness temperature maps is eight times coarser 
than that used for the iV-body simulation. 

The redshift evolution of xh i is, at present, largely unknown. 
Instead of assuming a particular model for xh i(-o), we have fixed 
the redshift z = 8 and run our simulations for different values of 
xh i at this fixed redshift. We have simulated Hi maps for xh i val¬ 
ues at an interval of 0.1 in the range 1.0 > xh i > 0.3 in addition 
to Xh i = 0.15. For each simulation volume (Vi and Vi) and for 
each value of xh i, we have run 50 independent simulations to gen¬ 
erate an ensemble of 50 statistically independent realizations of the 
21-cm signal. We refer to this ensemble as the Signal Ensemble 
(SE). The left-hand panel of Fig. 1 shows a section through one re¬ 


alization of the SE for 1 h i = 0.5. We have used the SE to estimate 
the bin-averaged power spectrum P(ki) and the error covariance 
matrix Cij for the two different simulation volumes Vi and Vi, and 
for the different xh i values mentioned earlier. 


5 SIMULATING REFERENCE ENSEMBLES 

The previous section describes how we have estimated the power 
spectrum error-covariance Cij. In summary, we have constructed 
an ensemble of 50 statistically independent realizations of the sim¬ 
ulated EoR 21-cm signal and used this to estimate Cij. We refer to 
this ensemble as the SE. The question now is ‘How do we inter¬ 
pret the estimated Cij T. We know that for a Gaussian random field 
we expect: (A.) the diagonal terms to have values as predicted by 
equation (29), and (B.) the off-diagonal terms to be zero. We may 
interpret any deviation from this as arising from non-Gaussianity, 
and then use these deviations to quantify the contribution from the 
trispectrum in equation (31). While this is straightforward in con¬ 
cept, several complications arise in practice. 

5.1 The Randomized Signal Ensemble 

The first complication arises when we try to interpret the diagonal 
terms Ca. We expect these to have values as predicted by equation 
(29) if the signal were a Gaussian random field, and it is possible 
to interpret deviations from this relation in terms of the trispec¬ 
trum which appears in equation (31) when the signal becomes non- 
Gaussian. The problem arises because it is not possible to use the 
SE to independently determine the value of P 2 (ki) which appears 
in equation (29). We have overcome this problem by constructing 
the Randomized Signal Ensemble (RSE). 

Each realization of RSE contains the signal drawn from all 
the 50 realizations in SE. We have labeled all the modes in the 
simulation volume as fci, ki ,.... Note that we are free to choose 
any arbitrary labeling scheme as long as it assigns an unique label 
to each distinct Fourier mode k. The Fourier modes are then divided 
into sets _4i = {fci,fe 5 i,* 10 i,...}, Ai = {ki,k 5 i,k 102 ,...},... 
Asa = {^ 50 ,^ 100 ,^ 150 , For the first realization in RSE, the 
signal for all the modes in „4i is drawn from the first realization in 
SE (i.e. [SE]i), and the signal for all the modes in Ai is drawn 
from the second realization in SE (i.e. [SE] 2 ), and so on. The first 
realization in RSE thus contains a mixture of the signal drawn from 
all the 50 realizations in SE. For the second realization in RSE, 
the signal for all the modes in A i is drawn from [SE] 2 , and the 
signal for all the modes in Ai is drawn from [SE ]3 and so on. The 
second realization in RSE also contains signal drawn from all the 
50 realizations in SE. Further, there is no signal which is common 
between the first and second realization in RSE. The 50 realizations 
in RSE have all been constructed in this fashion such that each 
realization of RSE contains a mixture of the signal from all the 50 
realizations in SE. Further, none of the realizations in RSE have any 
signal in common. The right-hand panel of Fig. 1 shows a section 
through one realization of the RSE for xh i = 0.5. 

We do not expect the signal in the modes drawn from SEi to 
be correlated with those drawn from SE 2 , etc. We therefore expect 
the average trispectrum T(ki, kj) to be at least 50 times smaller for 
RSE as compared to SE. For the purpose of this work we have as¬ 
sumed that T(ki, kj) « 0 for RSE. Further, since the entire signal 
in SE is also present in RSE, we expect P(ki) and P 2 (ki) to have 
exactly the same value in both SE and RSE. The RSE, therefore, 
provides an independent estimates of P 2 (ki). We have used this 
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Figure 1 . Two-dimensional sections through the simulated Hi brightness temperature maps for xh i = 0.5 and [150 Mpc] 3 volume. The three panels each 
show a single realization drawn from the three different ensemble. Signal (left), Gaussian Random (middle) and Randomized Signal (right). The direction of 
redshift space distortion is with respect to a distant observer located along the horizontal axis. 


to estimate the values which the diagonal elements of Cij (equa¬ 
tion 31) are expected to have if the EoR signal were a Gaussian 
random field with T(ki, kj ) = 0. It thus becomes possible to inter¬ 
pret any deviations from this as arising from T(ki, kj) due to the 
non-Gaussianity in the EoR 21-cm signal. 

5.2 Ensemble of Gaussian Random Ensembles 

The second complication arises from the fact that the SE has a finite 
number of realizations. To appreciate this we construct the Gaus¬ 
sian Random Ensemble (GRE). The GRE, like the SE, contains 50 
realizations of the 21-cm signal, the signal in each realization how¬ 
ever is a Gaussian random field. The signal at any mode k in the 
i th bin is calculated using 

f b (k) = ^Hk) + ib(k)} (32) 

where a(k) and b(k) are two real valued independent Gaussian 
random variables of unit variance, and P{ki) is the bin-averaged 
power spectrum calculated from SE. The middle panel of Fig. 1 
shows a section through one realization of the GRE for s;h i = 0.5. 

The bin-averaged power spectrum estimated from any single 
realization in GRE will be different from P(ki). Further, the bin 
averaged power spectrum estimated using all 50 members of GRE, 
which we refer to as [P(ki)]G, will also differ from P(ki) because 
of the limited number of realizations. Similarly, the off -diagonal 
terms of the error-covariance [Cij]G estimated from GRE will not 
be zero but will have random fluctuations around zero due to the 
limited number of realizations. It is thus necessary to compare the 
Cij estimated from SE against the random fluctuation of [Cij]G in 
order to determine whether Cij estimated from SE is statistically 
significant or not. The issue now is to estimate the variance of the 
covariance [(%,■] g. We have used 50 independent GREs to construct 
an Ensemble of Gaussian Random Ensembles (EGRE) which we 
have used to estimate the variance [SCij ]o of [Cij]G- In summary, 
we cannot straightaway interpret the non-zero off-diagonal terms in 


Cij as arising from non-Gaussianity in the EoR 21-cm signal. It is 
necessary to assess the statistical significance of the non-zero val¬ 
ues by comparing them against [8Cij]G estimated from the EGRE. 


6 RESULTS 

Fig. 1 shows three 21-cm maps corresponding to individual real¬ 
izations drawn from the SE, GRE and RSE respectively. The simu¬ 
lations all correspond to the same neutral fraction ihi = 0.5 and 
they all have the same bin averaged power spectrum P(fc;). It is be¬ 
lieved that at the length scales which will be probed by observations 
the EoR 21-cm signal (in terms of both power spectrum and vari¬ 
ance) peaks around ih i ~ 0.5 (see e.g. McQuinn et al. 2007; Lidz 
et al. 2008; Barkana 2009; Choudhury, Haehnelt & Regan 2009; 
Mesinger, Furlanetto & Cen 2011; Jensen et al. 2013; Majumdar, 
Bharadwaj & Choudhury 2013; Iliev et al. 2014; Patil et al. 2014; 
Watkinson & Pritchard 2014; Majumdar et al. 2015), and we have 
thus restricted the entire discussion of this section to the situation 
where ih i = 0.5. At this stage we expect a little less than 50% 
of the volume to be occupied by ionized bubbles. These bubbles, 
which are quite distinctly visible in the left-hand panel, cause the 
EoR 21-cm signal to be significantly non-Gaussian atini = 0.5. 
This is quite apparent if we compare the EoR signal to the central 
panel which is a Gaussian random field. There are no bubbles visi¬ 
ble in the central panel. The Randomized Signal (shown in the right 
most panel of the same figure), which has a much smaller trispec¬ 
trum compared to the EoR signal, looks quite distinct from both the 
other cases. 

Fig. 2 shows the mean squared brightness temperature fluc¬ 
tuation of the EoR 21-cm signal A 2 (fc) = k 3 P(k) / (2n) 2 as a 
function of k. This essentially is a measure of the bin averaged 
21-cm power spectrum P(k) estimated from SE. The k range 
kmin = 2.09 x 10 -2 Mpc -1 to fc max = 5.61 Mpc -1 has been di¬ 
vided in 10 equally spaced logarithmic bins with A ki/ki « 0.48. 
We have maintained the same bin widths for both the simulation 
volumes Vi = [150 Mpc] 3 and V 2 = [215 Mpc] 3 . However, we 
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A^(fc) and its la error bars for in g = 0.5. The results are shown for 
simulations with the two different box size of 150Mpc and 215Mpc, re¬ 
spectively. 



Figure 3. This shows Ca for SE considering both the simulation volumes 
Vi and V 2 . We also show (V 1 /V 2 ) [Ca] v\ where Ca determined for Vi 
has been scaled to account for the 1/V dependence predicted by equation 

(31). 


notice that the value of ki, the average k value corresponding to a 
particular bin, varies from Vi to V 2 (Fig. 2). This variation arises 
because the exact number and values of the Fourier modes in a 
particular bin changes if we change the simulation volume even 
though A ki is fixed. Comparing the results from the two simula¬ 
tion volumes, we see that there is very little change in the power 
spectrum between Vi and V 2 . This indicates that the simulation 
volumes used here are sufficiently large so that the power spectrum 
has converged. The error bars shown in the figure correspond to the 
1 — a error SP(ki) = y/Ca estimated from SE. We notice that the 
error bars change from V\ to V 2 , the errors being smaller for the 
larger simulation. This arises from the Ca oc 1/V dependence (eq. 
31) discussed earlier. A detailed analysis of the covariance matrix 
Cij follows. 

We now shift our attention to the error covariance matrix Cij 
which is the main focus of this paper. Fig. 3 shows the diagonal 
elements Ca as a function of k for the two different simulation 
volumes Vi and V 2 . We have also shown ( V 1 /V 2 ) [Cii]yy where 
the matrix elements determined for Vi have been scaled to account 
for the 1/V dependence predicted by equation (31). We see that the 


scaled elements are in reasonable agreement with the results for V 2 , 
roughly indicating that the error-covariance has converged within 
the simulation volume which we have used here. We see that the 
values of the covariance matrix span a very large dynamical range, 
and it is not very convenient to analyse this if we are looking for 
relatively small changes in the values. We find that it is much more 
convenient to instead use the dimensionless covariance matrix ctj 
which is defined as 

Cij V k 3/2 k 3 / 2 

c *i — Yn \2 P(U \ p/l. \ ’ ( 33 ) 

(2n) 2 P(ki) P(kj) 

and which, using equation (31), can be expressed as 


where 


= A 2 


ki 

A ki 


dii ”t~ ti 


f(ki,kj)k 3/2 k 3 / 2 
(27 !)*P(ki)P{kj) ’ 


(34) 


(35) 


is the dimensionless bin-averaged trispectrum and 


Ai = 


P 2 (ki) 

[P(h )] 2 ' 


(36) 


is a number of order unity introduced in Mondal et al. (2015). The 
value of Ai is expected to vary from bin to bin. We also expect 
its value to vary if we change the simulation volume. Flowever, 
all these variations are expected to be small, and we may expect a 
value Ai « 1 in most situations. 

The left-hand panel of Fig. 4 shows ca, the diagonal elements 
of the dimensionless covariance matrix, as a function of k. The 
volume dependence of Ca has been scaled out in the definition of 
Ca (equation 33), and we do not expect the ca values to change 
if we vary the volume provided that the error-covariance has con¬ 
verged within the simulation volume. We find that the values of ca 
obtained from the two different volumes Vi and V 2 are consistent 
with each other over the range 0.1 < k < 0.5 Mpc -1 . The val¬ 
ues obtained from V 2 , however, are ~ 1.5 times larger than those 
obtained from Vi at larger values k > 0.5 Mpc -1 . The difference 
at small k (k < 0.1 Mpc -1 ) may be attributed to the cosmic vari¬ 
ance of the error-covariance and is possibly not statistically sig¬ 
nificant. Flowever, the differences between Vi and V 2 at large k 
appears to be significant. We find that the smaller volume Vi is 
under-estimating the error-covariance relative to V 2 , indicating that 
for k > 0.5 Mpc -1 the error-covariance has not converged within 
the simulation volume. One would naively expect convergence is¬ 
sues to be more important at large scales which are comparable to 
the simulation size. The fact that the error-covariance appears to 
have converged at large scales (0.1 < k < 0.5 Mpc -1 ) while it 
seems to have not converged at small scales (fc > 0.5 Mpc -1 ) is 
quite counter intuitive, and we shall address this a little later. 

The right-hand panel of Fig. 4 shows ca estimated from the 
RSE for which we expect tij ~ 0, whereby 


JRSE 


= A 2 



(37) 


This gives an estimate of the error-covariance that would be ex¬ 
pected if the EoR signal were a Gaussian random field. As ex¬ 
pected, we see that the values of [cm]rse are below those estimated 
from SE. Mondal et al. (2015) have estimated the value of Ai in 
a completely independent manner by fitting the behaviour of the 
SNR as a function of Nk . The latter method ignores the fact that 
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Figure 4. This shows cu for SE (left) and RSE (right). The prediction based on using the constant A from Mondal et al. (2015) in equation (37) is also shown 
in the right-hand panel. 



Ai varies from bin to bin, and returns just a single value of A which 
is A = 0.98 for i = 0.5. We have also plotted [cmIrse using 
this A value and the ki / Aki values corresponding to the k bins in 
Vi. We note that A ki/ki « 0.48, though the actual value changes 
somewhat from bin to bin. We find that [c^rse estimated from the 
VI and Vi RSE simulations, and also from equation (37) using the 
constant A, are all consistent with one another.This consistency, in 
a sense, also validates the idea that the RSE allows us to indepen¬ 
dently estimate the error-covariance that would be expected if the 
EoR signal were a Gaussian random field (equation 37). 

We further illustrate the idea behind the RSE and also validate 
this in Fig. 5. Recollect that each realization in the RSE contains a 
mixture of signal from 50 independent realizations of the EoR sig¬ 
nal, and we expect tij for RSE to be at least 50 times smaller than 
tij estimated from SE. In addition to RSE, we also show results for 
RSE10 and RSE25. Each realization in RSE10 has signal drawn 
from 10 independent realization from SE instead of 50. We expect 
tij for RSE 10 and RSE25 to be respectively around 10 and 25 times 
smaller than tij estimated from SE. Starting from SE (equation 34) 
, we expect the values of cu to slowly approach equation (37) as 
we move from RSE 10 to RSE25 and then to RSE. This transition 
is clearly seen in Fig. 5. There is very little change in the values of 



Figure 6. This shows tu estimated from the two different simulation vol¬ 
umes Vi and Vi. 


Ca from RSE25 to RSE50 ( except possibly at the largest k value). 
This validates the assumption that tij « 0 for the RSE. 

The difference Cu — [Cm]rse gives an estimate of the bin- 
averaged trispectrum. Here we have used tu = cu — [c«]rse 
to estimate the dimensionless bin-averaged trispectrum for which 
the results are shown in Fig. 6. We see that the results for the two 
different simulation volumes look quite similar, though there are 
some differences in the actual values. The tu values estimated from 
the larger volume Vi are larger than those estimated from Vi at 
k > 0.2 Mpc -1 . The tu values differ by a nearly constant ratio 
of 1.5 at k > 1 Mpc -1 . The trend is reversed at k < 0.2 Mpc -1 
where the values estimated from Vi are larger than those from Vi. 
Taken at face value, these discrepancies in the values of tu between 
the two different simulation volumes indicate that the trispectrum 
has not converged within the simulation volume. We note, however, 
that it is necessary to be cautious before arriving at such a conclu¬ 
sion because we have no estimate of the cosmic variance for tu. 
The discrepancy at large k is possibly genuine, whereas the discrep¬ 
ancy at small k is possibly influenced by the cosmic variance. For 
the subsequent discussion in this paper we focus on the larger vol¬ 
ume Vi assuming that the results are representative of what would 
be expected for an even larger volume. 

We see (Fig. 6) that we have tu ~ 1 for k ~ 0.1 Mpc -1 , and 
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Figure 8. This shows r^j estimated for a GRE. 



it increases quite rapidly with tn ~ 10 and ~ 10 3 at k ~ 1 Mpc -1 
and ~ 5 Mpc -1 respectively. In contrast, we have [c«]fise ~ 5 
for nearly the entire k range (Fig. 4). We thus expect the error- 
covariance ca to be largely dominated by the trispectrum tu for 
nearly the entire k range that we have considered here. Fig. 7 shows 
the ratio tu/[di]RSE which quantifies the relative magnitudes of 
the two terms that contribute to cu (equation 34). We see that the 
two terms make roughly equal contributions in the range 0.2 < 
k < 0.3 Mpc -1 . The relative contribution from the trispectrum 
increases quite steeply with increasing k. At the largest k value 
(~ 5 Mpc -1 ), the contribution from the trispectrum is ~ 200 times 
larger than the error-covariance that we would expect if the EoR 
signal were a Gaussian random field. 

We now shift our focus to the off-diagonal elements of dj 
which quantify the correlation between the errors at different k 
bins. Since the diagonal terms cu span a pretty large dynamical 
range, it is more convenient to consider the correlation coefficient 


y/CU Cj 


(38) 


instead of directly analysing the off-diagonal terms of dj. The val¬ 
ues of dj are, by definition, restricted to lie in the range —1 < 
dj < 1, the values dj = 1 and —1 indicating that the errors in 
the i and j bin are fully correlated and anti-correlated respectively. 
Intermediate values (—1 < rij < 1) indicate partial correlation or 


anti-correlation, and dj = 0 indicates that the errors in the i and 
j bins are uncorrelated. Also note that we have dj = 1 for all the 
diagonal elements of rij. We first consider the GRE for which the 
errors in the different bins are uncorrelated. Fig. 8 shows rij esti¬ 
mated using a single GRE. We see that in addition to the diagonal 
elements which have value ru = 1, the off-diagonal elements also 
have non-zero values. As discussed in Section 5.2, these non-zero 
values are from random fluctuations which are a consequence of 
the limited number of realizations in the GRE. Fig. 9 shows dj 
estimated from SE. We see that the results from both the simula¬ 
tion volumes of SE look very similar. Comparing the SE with the 
GRE, we see that while the dj values in Fig. 8 (GRE) appear to be 
quite random, Fig. 9 (SE) exhibits some sort of an organized pat¬ 
tern. The most prominent feature which we notice is that the errors 
in the five largest k bins (k > 0.5 Mpc - ) are strongly correlated. 
Further, the errors in the three smallest k bins (fc < 0.3 Mpc - ) 
are correlated with the three largest k bins (k > IMpc -1 ). Fi¬ 
nally, we also find a relatively weak anti-correlation between the 
two smallest k bins (k < 0.1 Mpc -1 ) and the intermediate bins 
- 0.2 - 0.4Mpc -1 . 

Fig. 10 shows the dj values estimated from SE for both the 
simulation volumes Vi and V 2 . Each panel of the figure corre¬ 
sponds to a fixed value of i, and it shows dj as a function of kj. We 
have used the EGRE (Section 5.2) to estimate [Srij]G which quan¬ 
tifies the fluctuation of the off-diagonal terms around [rij] a = 0 
expected for a Gaussian random field. For reference, we have also 
shown rij estimated from RSE with V 2 . Note that in all cases we 
have rij = 1 for the diagonal terms which have j = i. 

We expect [£r,-]rse ~ 0, which implies that we also expect 
[fi^RSE = 0 for the off-diagonal terms. We find that the values es¬ 
timated from RSE are nearly always within the shaded region cor¬ 
responding to [5rij]G, indicating that our results are indeed consis¬ 
tent with [r,;j ]rse = 0. This is yet another validation of the fact that 
the method by which we have generated the RSE actually destroys 
the correlation between the signal at different Fourier modes and 
results in [UjJrse ~ 0. The results from Vi and V 2 are quite simi¬ 
lar for SE. Further, there are several regions where the dj values for 
SE are outside the shaded region. We interpret these as being statis¬ 
tically significant and discuss these below. We find that the errors in 
the five largest bins (k > 0.5 Mpc) are strongly correlated with the 
correlation coefficient having values r ij > 0.6. The correlation in¬ 
creases to rij > 0.9 if we consider just the three largest k bins. The 
errors in the three smallest k bins (k < 0.2 Mpc) are also correlated 
with the errors in the five largest k bins. The errors in the two small¬ 
est k bins (k <0.1 Mpc), however, are weakly anti-correlated with 
the errors in the 4-th and 5-th bins (0.2 < k < 0.4 Mpc). 


7 SUMMARY AND DISCUSSION 

The error-covariance matrix of the EoR 21-cm power spectrum is 
an important ingredient for making predictions for ongoing and fu¬ 
ture experiments to detect the EoR signal. In this work we only 
consider the errors which are intrinsic to the EoR 21-cm sig¬ 
nal, i.e. the cosmic variance, and ignore the system noise aris¬ 
ing from radio-interferometric observations. The EoR 21-crn sig¬ 
nal becomes increasingly non-Gaussian as reionization proceeds. 
Non-Gaussianity introduces correlations between the signal in dif¬ 
ferent Fourier modes, this being quantified through the bispec¬ 
trum, trispectrum, etc. While the power spectrum itself does not 
tell anything as to whether the underlying signal is Gaussian or 
non-Gaussian, we show that the error-covariance matrix Cij for the 
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Figure 9. This shows estimated for SE considering both the simulation volumes Vi (left) and V 2 (right). 


binned power spectrum is sensitive to the non-Gaussianity through 
the bin averaged trispectrum T(fc;, kj) which appears in equation 
(31). 

The error covariance matrix scales inversely with the volume 
as Cij oc V -1 , and it is more convenient to analyse the dimen¬ 
sionless error covariance matrix dj (equation (33)) which is inde¬ 
pendent of volume. We have used an ensemble of 50 independent 
realizations of the simulated EoR 21-cm signal (referred to as the 
SE) to estimate aj. The entire analysis was restricted to a single 
neutral fraction x-r 1 = 0.5. The left-hand panel of Fig. 4 shows 
ca, the diagonal elements of dj, as a function of k. We can in¬ 
terpret each diagonal element c„ as the dimensionless error vari¬ 
ance for the power spectrum estimated in the corresponding bin. 
For the A ki bins used here, we expect the dimensionless error vari¬ 
ance to have a value ca « 2 across all the k bins if the EoR sig¬ 
nal is a Gaussian random field. We find a roughly constant value 
ca ~ 5 in the k range 0.05 < k < 0.5 Mpc -1 , the value of ca 
increases sharply beyond k > 0.5 Mpc -1 and we have ca ~ 10 3 
at k ~ 5.0 Mpc -1 . We see that the actual error in the estimated 
EoR 21-cm power spectrum is considerably in excess of the error 
predicted for a Gaussian random field. This discrepancy arises be¬ 
cause the EoR Hi distribution is dominated by several large ionized 
bubbles (left-hand panel of Fig. 1) and the emanating 21-cm signal 
is not a Gaussian random field. 

The diagonal elements ca are the sum of two parts (equation 
34). The first part Af (ki / A ki) is the contribution that would arise 
if the EoR signal were a Gaussian random field. In this case it is 
possible to reduce the error covariance ca by increasing the bin 
width or equivalently combining a larger number of independent 
Fourier modes. Non-Gaussianity, however, introduces an extra term 
tu which is the dimensionless bin averaged trispectrum. As a con¬ 
sequence the dimensionless error variance ca does not decrease 
beyond a limiting value, and it is not possible to decrease the error 
beyond this by increasing the number of Fourier modes in the bin. 

The SE provides an estimate of the total dimensionless error 
variance c,;,, however it is not possible to separately estimate the 
two parts Af ( ki/Aki ) and tu using SE. We have overcome this 
problem by constructing the RSE in which each realization contains 
a mixture of the signal from all realizations of SE. This destroys the 
correlation between the signal at different Fourier modes, and we 
have tu « 0. Since the entire signal in SE is also present in RSE, 
the RSE provides an independent estimate of the ca that would be 
expected if the EoR 21-cm signal were a Gaussian random field (i.e. 


[ch]rse = A \ (ki/Aki)). The right-hand panel of Fig. 4 shows 
[ciijRSE as a function of k. We find that the [c^rse show little 
variation with k with values in the range 2 < [c^rse < 5. This is 
consistent with what we expect from Ai as 1 and A ki/ki as 0.48, 
note that the actual values of Ai and P(ki)/ki vary from bin to bin. 

The difference ca — [ci;] rse gives an estimate of the di¬ 
mensionless bin-averaged trispectrum tu. We find (Fig. 6) that the 
value of tu increases monotonically with k. We have tu ~ 1 for 
k ~ 0.1 Mpc -1 , and it increases quite rapidly with tu ~ 10 
and ~ 10 3 at k ~ IMpc -1 and ~ 5 Mpc -1 respectively. Fig. 
7 shows the ratio tu /[di]nsE. This quantifies the relative magni¬ 
tudes of the two terms which contribute to total error variance Ca, 
here [c«]rse is the error variance that would arise if the EoR 21-cm 
signal were a Gaussian random field and tu is the extra contribu¬ 
tion to the error variance arising from the non-Gaussianity of the 
EoR 21-cm signal. We find tu/[di\ESE > 1 for k > 0.2 Mpc -1 , 
the value of this ratio increases with k and it is ~ 10 and ~ 200 
at k ~ IMpc -1 and k ~ 5Mpc -1 respectively. The two terms 
[cm] rse and tu make roughly equal contributions to cu in the range 
0.2 < k < 0.3 Mpc -1 . The relative contribution from the trispec¬ 
trum increases sharply at k > 0.3 Mpc -1 . 

We find that the error variance is dominated by the trispectrum 
at Fourier modes k > 0.3 Mpc -1 . The error variance would be 
severely underestimated if the EoR 21-cm signal were assumed to 
be a Gaussian random field. We find that the actual error variance 
is predicted to be ~ 11 and ~ 200 times larger than the Gaussian 
prediction at k ~ 1 Mpc -1 and k ~ 5 Mpc -1 respectively. 

We next consider the off-diagonal elements of the error co- 
variance Cij. The off-diagonal elements quantify the correlation 
between the errors in the power spectrum estimated in different k 
bins. The off-diagonal elements are zero for a Gaussian random 
field for which the errors in the different k bins are uncorrelated. 
Non-Gaussianity, however, introduces correlations between the er¬ 
rors at different k bins (equation 31) . We quantify this using the 
dimensionless correlation coefficient nj which has values in the 
range —1 < nj < 1, the values m = 1 and —1 indicating that 
the errors in the i and j bin are fully correlated and anti-correlated 
respectively. Intermediate values (—1 < nj < 1) indicate partial 
correlation or anti-correlation, and r L] = 0 indicates that the er¬ 
rors in the i and j bins are uncorrelated. We have used the SE to 
estimate nj for the EoR 21-cm power spectrum (Fig. 9), and the 
EGRE to establish the statistical significance (Fig. 10). 

We find that the error in the five largest k bins ( k > 
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Figure 10. This shows rij estimated for SE considering both the simulation volumes V 2 (solid) and V\ (dashed). We also show rij estimated from RSE 
(dotted) with V 2 . The shaded region represents the [Srij]c which quantifies the fluctuation of the off-diagonal terms around [r^j ]q = 0 expected for a 
Gaussian random field. 
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0.5Mpc -1 ) are strongly correlated (nj > 0.6). We also find a 
relatively weaker correlation between three smallest k bins (k < 
0.3 Mpc -1 ) and three largest k bins (k > IMpc -1 ). Further, 
the error in the two smallest k bins (k < 0.1 Mpc -1 ) are anti¬ 
correlated with the intermediate bins ~ 0.2 — 0.4 Mpc -1 . We 
find that this anti-correlation is present for both the simulation vol¬ 
umes Vi and Vi (Fig. 9) which are statistically independent. This 
seems to indicate that this anti-correlation is a statistically signifi¬ 
cant effect, however the origin of this anti-correlation is not clear at 
present. 

The non-linear gravitational clustering of the underlying den¬ 
sity field and the presence of discrete ionized regions in the 
Hi distribution both contribute to the non-Gaussianity of the 21- 
cm signal. The non-linear gravitational clustering is particularly 
important at small scales where it leads to the collapse of over- 
dense regions to form gravitationally bound objects that host the 
luminous galaxies that subsequently reionize the universe. Inter¬ 
estingly, the over-densities are also the regions which get ion¬ 
ized first in the inside-out reionization scenario implemented in 
our simulations. Consequently, the over-dense regions are miss¬ 
ing from the 21-cm signal in our simulations, and we expect the 
non-Gaussianity from the non-linear gravitational clustering to be 
subdominant to the non-Gaussianity arising from the ionized bub¬ 
bles in the Hi distribution. This also allows us to interpret the strong 
correlation in the error at the five largest k bins (k > 0.5 Mpc -1 ). 
The length-scales (R < 13Mpc) corresponding to these Fourier 
modes are smaller than the size of the individual ionized regions 
(Fig. 1), and consequently the 21-cm signal in the different modes 
in this k range is highly correlated because it originates from the 
excluded volume of the same ionized regions. Further, the ionized 
regions are centred on the peaks of the density field which them¬ 
selves are expected to have a clustering pattern which is related to 
that of the underlying matter distribution. We therefore expect the 
ionized regions to be correlated with the large-scale clustering of 
the Hi distribution, a fact which is reflected in the correlation be¬ 
tween the errors at large k and small k. 

This work is limited in that we have used a simple model of 
reionization, and the entire analysis is restricted to a situation where 
x H i = 0.5 at z = 8. The predictions will be different for some 
other model of reionization with different ionizing source proper¬ 
ties, inhomogeneous recombinations, fluctuations in the spin tem¬ 
perature etc. While the quantitative predictions are liable to change 
for different reionization scenarios, this work emphasises the fact 
that the non-Gaussian effects will play an important role in the er¬ 
ror predictions for the EoR 21-cm power spectrum. The effect of 
non-Gaussianity is expected to increase further as reionization pro¬ 
ceeds and the neutral fraction falls below xh i = 0.5 (Mondal et al., 
2015). 

There are several experiments like LOFAR, MWA and PAPER 
which are currently underway to measure the EoR 21-cm power 
spectrum, and other instruments like HERA and SKA1 LOW are 
expected to be functional in future. All of these instruments target 
measurements of the EoR 21-cm power spectrum in the k range 
0.1 < k < 2 Mpc -1 . The results of this work clearly show that 
the the errors would be severely underestimated under the Gaus¬ 
sian assumption. A proper treatment of the error covariance matrix 
is crucial for correct error predictions. Such predictions are impor¬ 
tant to assess the prospects of detecting the power spectrum with 
a particular instrument. Further, correct error predictions are also 
important for interpreting the power spectrum subsequent to a de¬ 
tection. In future work we plan to consider ongoing and future EoR 


experiments and carry out comprehensive error analysis including 
the system noise. 
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